Pesquisa | Portal Regional da BVS

The key role of absolute risk in the disclosure risk assessment of public data releases.

Hotz, V Joseph; Bollinger, Christopher R; Komarova, Tatiana; Manski, Charles F; Moffitt, Robert A; Nekipelov, Denis; Sojourner, Aaron; Spencer, Bruce D.

Proc Natl Acad Sci U S A ; 121(11): e2321882121, 2024 Mar 12.

Artigo em Inglês | MEDLINE | ID: mdl-38442168

Assuntos

Revelação

Balancing data privacy and usability in the federal statistical system.

Hotz, V Joseph; Bollinger, Christopher R; Komarova, Tatiana; Manski, Charles F; Moffitt, Robert A; Nekipelov, Denis; Sojourner, Aaron; Spencer, Bruce D.

Proc Natl Acad Sci U S A ; 119(31): e2104906119, 2022 08 02.

Artigo em Inglês | MEDLINE | ID: mdl-35878030

RESUMO

The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other hand, agencies which release public databases face new challenges in protecting the privacy of the subjects in those databases, which leads them to consider releasing fewer data or masking the data in ways that will reduce their accuracy. In this essay, we argue that the discussion has not given proper consideration to the reduced social benefits of data availability and their usability relative to the value of increased levels of privacy protection. A more balanced benefit-cost framework should be used to assess these trade-offs. We express concerns both with synthetic data methods for disclosure limitation, which will reduce the types of research that can be reliably conducted in unknown ways, and with differential privacy criteria that use what we argue is an inappropriate measure of disclosure risk. We recommend that the measure of disclosure risk used to assess all disclosure protection methods focus on what we believe is the risk that individuals should care about, that more study of the impact of differential privacy criteria and synthetic data methods on data usability for research be conducted before either is put into widespread use, and that more research be conducted on alternative methods of disclosure risk reduction that better balance benefits and costs.

Assuntos

Segurança Computacional , Confidencialidade , Privacidade , Coleta de Dados , Revelação , Governo Federal , Órgãos Governamentais

Node sampling for protein complex estimation in bait-prey graphs.

Scholtens, Denise M; Spencer, Bruce D.

Stat Appl Genet Mol Biol ; 14(4): 391-411, 2015 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-26226130

RESUMO

In cellular biology, node-and-edge graph or "network" data collection often uses bait-prey technologies such as co-immunoprecipitation (CoIP). Bait-prey technologies assay relationships or "interactions" between protein pairs, with CoIP specifically measuring protein complex co-membership. Analyses of CoIP data frequently focus on estimating protein complex membership. Due to budgetary and other constraints, exhaustive assay of the entire network using CoIP is not always possible. We describe a stratified sampling scheme to select baits for CoIP experiments when protein complex estimation is the main goal. Expanding upon the classic framework in which nodes represent proteins and edges represent pairwise interactions, we define generalized nodes as sets of adjacent nodes with identical adjacency outside the set and use these as strata from which to select the next set of baits. Strata are redefined at each round of sampling to incorporate accumulating data. This scheme maintains user-specified quality thresholds for protein complex estimates and, relative to simple random sampling, leads to a marked increase in the number of correctly estimated complexes at each round of sampling. The R package seqSample contains all source code and is available at http://vault.northwestern.edu/~dms877/Rpacks/.

Assuntos

Biologia Computacional/métodos , Modelos Biológicos , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Algoritmos , Simulação por Computador , Humanos , Leveduras/metabolismo

When do latent class models overstate accuracy for diagnostic and other classifiers in the absence of a gold standard?

Spencer, Bruce D.

Biometrics ; 68(2): 559-66, 2012 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-22017371

RESUMO

Latent class models are increasingly used to assess the accuracy of medical diagnostic tests and other classifications when no gold standard is available and the true state is unknown. When the latent class is treated as the true class, the latent class models provide measures of components of accuracy including specificity and sensitivity and their complements, type I and type II error rates. The error rates according to the latent class model differ from the true error rates, however, and empirical comparisons with a gold standard suggest the true error rates often are larger. We investigate conditions under which the true type I and type II error rates are larger than those provided by the latent class models. Results from Uebersax (1988, Psychological Bulletin 104, 405-416) are extended to accommodate random effects and covariates affecting the responses. The results are important for interpreting the results of latent class analyses. An error decomposition is presented that incorporates an error component from invalidity of the latent class model.

Assuntos

Biometria/métodos , Classificação/métodos , Testes Diagnósticos de Rotina/estatística & dados numéricos , Modelos Estatísticos , Infecções por Chlamydia/diagnóstico , Testes Diagnósticos de Rotina/normas , Perda Auditiva/diagnóstico , Humanos , Jurisprudência , Funções Verossimilhança , Modelos Lineares , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Inquéritos e Questionários

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA